NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

RECAPHE: REconfigurable Polynomial Modular Computation Architectures for Unified PQC and HE Schemes

Wang, Antian; Zhang, Kaiyuan; Parhi, Keshab K; Lao, Yingjie (December 2025, Asilomar)

Full Text Available
HEDWIG: Homomorphic Encryption Accelerator Design Using BFV-HPS With HiGh-Speed Fixed-Point Approximation

https://doi.org/10.1145/3706628.3708853

Wang, Antian; Tan, Weihang; Xu, Zhenyu; Wei, Tao; Ding, Caiwen; Parhi, Keshab K; Lao, Yingjie (February 2025, ACM)

Full Text Available
HERMES: Homomorphic Encryption over Residual Number System for Multi-level EvaluationS

https://doi.org/10.1145/3676536.3697124

Wang, Antian; Zhang, Kaiyuan; Parhi, Keshab K; Lao, Yingjie (October 2024, ACM)

Homomorphic encryption enables computations on the ciphertext to preserve data privacy. However, its practical deployment has been hindered by the significant computational overhead compared to the plaintext computations. In response to this challenge, we present HERMES, a novel hardware acceleration system designed to explore the computation flow of the CKKS homomorphic encryption bootstrapping process. Among the major contributions of our proposed architecture, we first analyze the properties of the CKKS computation data flow and propose a new scheduling strategy by partitioning the computation modules into general-purpose and special-purpose modular computation modules to allow smaller resource consumption and flexible scheduling. The computation modules are also reconfigurable to reduce the memory access overhead during the intermediate computation. We also optimize the CKKS computation dataflow to improve the regularity with reduced control overhead.
more » « less
Full Text Available
Hardware Acceleration for Fully Homomorphic Encryption Scheme Switching from CKKS to FHEW

https://doi.org/10.1109/IEEECONF60004.2024.10942749

Zhang, Kaiyuan; Wang, Antian; Parhi, Keshab K; Lao, Yingjie (October 2024, IEEE)

Fully Homomorphic Encryption (FHE) presents a paradigm-shifting framework for performing computations on encrypted data, offering revolutionary implications for privacy-preserving technologies. This paper introduces a novel hardware implementation of scheme switching between two leading FHE schemes targeting different computational needs, i.e., arithmetic HE scheme CKKS, and Boolean HE scheme FHEW. The proposed architecture facilitates dynamic switching between the schemes with improved throughput and latency compared to the software baseline. The proposed architecture computation modules support scheme switching operations involving coefficient conversion, modular switching, and key switching. We also optimize the hardware designs for the pre-processing and post-processing blocks, involving key generation, encryption, and decryption. The effectiveness of our proposed design is verified on the Xilinx U280 Datacenter Acceleration FPGA. We demonstrate that the proposed scheme switching accelerator yields a 365× performance improvement over the software counterpart.
more » « less
Full Text Available
RECAPHE: REconfigurable Polynomial Modular Computation Architectures for Unified PQC and HE Schemes

https://doi.org/10.1109/IEEECONF67917.2025.11443610

Wang, Antian; Zhang, Kaiyuan; Parhi, Keshab K; Lao, Yingjie (October 2025, IEEE)

Post-Quantum Cryptography (PQC) and Homomorphic Encryption (HE) are emerging security primitives that strengthen data protection against adversaries equipped with quantum computing capabilities. Although PQC and HE rely on similar underlying arithmetic operations, their hardware implementations are typically developed independently due to differences in key parameters such as polynomial length and modulus bit-width. This work presents a unified lattice-based polynomial modular accelerator that efficiently supports both PQC and HE primitives, bridging these two domains toward future secure computing architectures. The proposed design introduces highly reconfigurable modular computation units that enable low-overhead runtime configuration across the parameter ranges commonly used in PQC and HE schemes. This unified architecture eliminates the need for separate domain-specific accelerators by reusing shared computation structures and workload patterns across both cryptographic schemes.
more » « less
Full Text Available
NNTesting: Neural Network Fault Attacks Detection Using Gradient-Based Test Vector Generation

https://doi.org/10.1109/DAC56929.2023.10247885

Wang, Antian; Zhao, Bingyin; Tan, Weihang; Lao, Yingjie (July 2023, IEEE)

Full Text Available
PaReNTT: Low-Latency Parallel Residue Number System and NTT-Based Long Polynomial Modular Multiplication for Homomorphic Encryption

https://doi.org/10.1109/TIFS.2023.3338553

Tan, Weihang; Chiu, Sin-Wei; Wang, Antian; Lao, Yingjie; Parhi, Keshab K. (January 2023, IEEE Transactions on Information Forensics and Security)

High-speed long polynomial multiplication is important for applications in homomorphic encryption (HE) and lattice-based cryptosystems. This paper addresses low-latency hardware architectures for long polynomial modular multiplication using the number-theoretic transform (NTT) and inverse NTT (iNTT). Parallel NTT and iNTT architectures are proposed to reduce the number of clock cycles to process the polynomials. Chinese remainder theorem (CRT) is used to decompose the modulus into multiple smaller moduli. Our proposed architecture, namely PaReNTT, makes three novel contributions. First, cascaded parallel NTT and iNTT architectures are proposed such that any buffer requirement for permuting the product of the NTTs before it is input to the iNTT is eliminated. This is achieved by using different folding sets for the NTTs and iNTT. Second, a novel approach to expand the set of feasible special moduli is presented where the moduli can be expressed in terms of a few signed power-of-two terms. Third, novel architectures for pre-processing for computing residual polynomials using the CRT and post-processing for combining the residual polynomials are proposed. These architectures significantly reduce the area consumption of the pre-processing and post-processing steps. The proposed long modular polynomial multiplications are ideal for applications that require low latency and high sample rate such as in the cloud, as these feed-forward architectures can be pipelined at arbitrary levels. Pipelining and latency tradeoffs are also investigated. Compared to a prior design, the proposed architecture reduces latency by a factor of 49.2, and the area-time products (ATP) for the lookup table and DSP, ATP(LUT) and ATP(DSP), respectively, by 89.2% and 92.5%. Specifically, we show that for n =4096 and a 180-bit coefficient, the proposed 2-parallel architecture requires 6.3 Watts of power while operating at 240 MHz, with 6 moduli, each of length 30 bits, using Xilinx Virtex Ultrascale+ FPGA.
more » « less
Full Text Available

Search for: All records